48 research outputs found
Application of sound source separation methods to advanced spatial audio systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development
and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by
means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel
stereo format, special up-converters are required to use advanced spatial audio reproduction formats,
such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to
accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is
required.
Source separation problems in digital signal processing are those in which several signals have been mixed
together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied
to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately,
most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This
condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to
the sparsity of the sources under some signal transformation.
This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result,
its contributions can be categorized within these two areas. First, two underdetermined SSS methods are
proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a
multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of
sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the
features considered by each of them are related to different localization cues that enable to perform separation
of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at
improving the isolation of the separated sources are proposed. The performance achieved by
several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of
listening tests, paying special attention to the change observed in the perceived spatial attributes.
Although the estimated sources are distorted versions of the original ones, the masking effects
involved in their spatial remixing make artifacts less perceptible, which improves the overall
assessed quality. Finally, some novel developments related to the application of time-frequency
processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci
Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques
[ES] La Separacón de Fuentes ha sido un tema de intensa investigación en muchas aplicaciones de tratamiento de señaal, cubriendo desde el procesado de voz al análisis de im'agenes biomédicas. Aplicando estas técnicas a los sistemas de reproducci'on espacial de audio, se puede solucionar una limitaci ón importante en la resÃntesis de escenas sonoras 3D: la necesidad de disponer de las se ñales individuales correspondientes a cada fuente. El sistema Wave-field Synthesis (WFS) puede sintetizar un campo acústico mediante arrays de altavoces, posicionando varias fuentes en el espacio. Sin embargo, conseguir las señales de cada fuente de forma independiente es normalmente un problema. En este trabajo se propone la utilización de distintas técnicas de separaci'on de fuentes sonoras para obtener distintas pistas a partir de grabaciones mono o estéreo. Varios métodos de separación han sido implementados y comprobados, siendo uno de ellos desarrollado por el autor. Aunque los algoritmos existentes están lejos de conseguir una alta calidad, se han realizado tests subjetivos que demuestran cómo no es necesario obtener una separación óptima para conseguir resultados aceptables en la reproducción de escenas 3D[EN] Source Separation has been a subject of intense research in many signal processing applications, ranging
from speech processing to medical image analysis. Applied to spatial audio systems, it can be used to
overcome one fundamental limitation in 3D scene resynthesis: the need of having the independent
signals for each source available. Wave-field Synthesis is a spatial sound reproduction system that can
synthesize an acoustic field by means of loudspeaker arrays and it is also capable of positioning several
sources in space. However, the individual signals corresponding to these sources must be available and
this is often a difficult problem. In this work, we propose to use Sound Source Separation techniques
in order to obtain different tracks from stereo and mono mixtures. Some separation methods have
been implemented and tested, having been one of them developed by the author. Although existing
algorithms are far from getting hi-fi quality, subjective tests show how it is not necessary an optimum
separation for getting acceptable results in 3D scene reproductionCobos Serrano, M. (2007). Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques. http://hdl.handle.net/10251/12515Archivo delegad
Evaluación por compañeros de exposiciones orales
El proceso de exposición oral, además de ser una
competencia instrumental importante dentro del Espacio
Europeo de Educación Superior (EEES), es
fundamental en el desarrollo del trabajo de un ingeniero,
debido a que le permite transmitir a la audiencia
sus conocimientos y trabajos de una forma
efectiva. Para que el alumno desarrolle esta competencia,
resulta habitual incluir actividades de exposición
oral dentro de las asignaturas de los nuevos
grados adaptados al EEES. El empleo de rúbricas
para la evaluación de estas exposiciones, permite al
alumno obtener una visión objetiva, clara y precisa
de los criterios que se van a emplear en su valoración.
Además, el empleo de estas rúbricas también
facilita la posibilidad de que los propios alumnos
califiquen el trabajo de sus compañeros, permitiendo
que desarrollen capacidades cognitivas superiores
como el pensamiento crÃtico y la capacidad
de análisis. En este artÃculo, se presenta una experiencia
orientada al desarrollo de este tipo de capacidades
en los alumnos de nuevo ingreso. Para ello,
los alumnos han realizado una evaluación por compañeros
de exposiciones orales. Esta evaluación se
ha llevado a cabo en la asignatura de IngenierÃa, Sociedad
y Universidad, impartida en los grados de IngenierÃa
Informática, IngenierÃa Multimedia e IngenierÃa
Telemática. Además de la descripción de la
experiencia, en este artÃculo también se incluye un
estudio de la correlación entre las evaluaciones de la
exposición oral realizadas por los alumnos respecto
a las llevadas a cabo por los profesores.SUMMARY -- The oral presentation process, besides being an important
instrumental competence within the European
Higher Education Area (EHEA), is a major issue
in the development of engineers’ work. In fact,
the enhancement of their oral presentation skills allows
them to transmit their knowledge to the audience
effectively. In order to develop this competence,
oral presentation activities are usually included
within the program of subjects belonging to the
new EHEA-adapted degrees. The use of rubrics for
the assessment of these presentations allows students
to obtain an objective, clear and accurate view
of the criteria employed in the evaluation process.
Moreover, the use of these items gives the students
the possibility to rate their peers job, which also
helps them to develop higher cognitive skills such
as critical thinking or other analytical capabilities.
In this paper we present an experience aimed at
developing these capabilities in new students. To
this end, the students themselves have been asked
to assess their peers presentations. This assessment
has been conducted within the ’Engineering, Society
and University’ subject, which is taught in several
degrees: Computer Science, Multimedia Engineering
and Telematics Engineering. In addition to the
description of this experience, the paper includes a
statistical analysis of the obtained results, showing
the correlation between the assessments corresponding
to students and those of the teacher.Peer Reviewe
Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors
Sound source separation has become a topic of intensive research in the last years. The research effort has been specially relevant for the underdetermined case, where a considerable number of sparse methods working in the time-frequency (T-F) domain have appeared. In this context, although binary masking seems to be a preferred choice for source demixing, the estimated masks differ substantially from the ideal ones. This paper proposes a maximum a posteriori (MAP) framework for binary mask estimation. To this end, class-conditional source probabilities according to the observed mixing parameters are modeled via ratios of dependent Cauchy distributions while source priors are iteratively calculated from the observed histograms. Moreover, spatially smoothed posteriors in the T-F domain are proposed to avoid noisy estimates, showing that the estimated masks are closer to the ideal ones in terms of objective performance measures.This work was supported by the Spanish Ministry of Science and Innovation under project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jingdong Chen.Cobos Serrano, M.; López Monfort, JJ. (2012). Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors. IEEE Transactions on Audio, Speech and Language Processing. 20(7):2059-2064. doi:10.1109/TASL.2012.2195654S2059206420
An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia
The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes a scalable parallel implementation of a real-time binaural audio engine for GPU-equipped mobile devices. The engine is based on a set of head-related transfer functions (HRTFs) modelled with a parametric parallel structure, allowing efficient synthesis and interpolation while reducing the size required for HRTF data storage. Several strategies to optimize the GPU implementation are evaluated over a well-known kind of processor present in a wide range of mobile devices. In this context, we analyze both the energy consumption and real-time capabilities of the system by exploring different GPU and CPU configuration alternatives. Moreover, the implementation has been conducted using the OpenCL framework, guarantying the portability of the code
Computer-based detection and classification of flaws in citrus fruits
[EN] In this paper, a system for quality control in citrus fruits is presented. In current citrus manufacturing industries, calliper and color are successfully used for the automatic classification of fruits using vision systems. However, the detection of flaws in the citrus surface is carried out by means of human inspection. In this work, a computer vision system capable of detecting defects in the citrus peel and also classifying the type of flaw is presented. First, a review of citrus illnesses has been carried out in order to build a database of digitalized oranges classified by the kind of fault, which is used as a training set. The segmentation of faulty zones is performed by applying the Sobel gradient to the image. Afterwards, color and texture features of the flaw are extracted considering different color spaces, some of them related to high order statistics. Several techniques have been employed for classification purposes: Euler distance to a prototype, to the nearest neighbor and k-nearest neighbors. Additionally, a three layer neural network has been tested and compared, obtaining promising results.López Monfort, JJ.; Cobos Serrano, M.; Aguilera MartÃ, E. (2011). Computer-based detection and classification of flaws in citrus fruits. Neural Computing and Applications. 20(7):975-981. doi:10.1007/s00521-010-0396-2S975981207Blasco J, Aleixos J, Molto E (2007) Computer vision detection of peel defects in citrus by means of a region oriented segmentation. J Food Eng 81:535–543Blasco J, Aleixos N, Gomez J, Molto E (2007) Citrus sorting by identification of the most common defects using multispectral computer vision. J Food Eng 83:384–391Bryson AE, Ho YC (1969) Applied optimal control: optimization, estimation, and control. Xerox College Publishing, Lexington, MAConners RWea (1983) Identifying and locating surface defects in wood. IEEE Trans Pattern Anal Mach Intell 5:573–583Diaz R, Gil L, Serrano C, Blasco M, Molto E, Blasco J (2004) Comparison of three algorithms in the classification of table olives by means of computer vision. J Food Eng 61:101–107Douglas DH, Peucker TK (1973) Algorithm for the reduction of the number of points required to represent a line or its caricature. The Can Cartogr 10(2):112–122Du CJ, Sun DW (2005) Comparison of three methods for classification of pizza topping using different colour space transformations. J Food Eng 68:277–287Kolesnikov A (2003) Efficient algorithms for vectorization and polygonal approximation. Ph.D. thesis, University of Joensuu, FinlandMolto E (1997) A computer vision system for inspecting citrus, peaches and apples. In: Proceedings of VII national symposium on pattern recognition and image analysis. Sabadell, Spain, pp 121–126Muir AY, Porteus RL, Wastie RL (1982) Experiments in the detection of incipient diseases in potato tubers by optical methods. J Agric Eng Res 27:131–138Q Li (2002) Computer vision based system for apple surface defect detection. computer and electronics in agriculture. Comput Electron Agric 36:215–223Ruiz LA, Molto E, Juste F, Pla F, Valiente R (1996) Location and characterization of the stem–calyx area on oranges by computer vision. J Agric Eng Res 64:165–172Tan TSC, Kittler J (1994) Colour texture analysis using colour histogram. IEEE Proc Vis Image Signal Process 141:403–412Wen Z, Tao Y (1999) Building a rule-based machine-vision system for defect inspection on apple sorting and packing lines. Expert Syst Appl 16:307–31
Evaluación por Compañeros de Exposiciones Orales
El proceso de exposición oral, además de ser una competencia instrumental importante dentro del Espacio Europeo de Educación Superior (EEES), es fundamental en el desarrollo del trabajo de un ingeniero, debido a que le permite transmitir a la audiencia sus conocimientos y trabajos de una forma efectiva. Para que el alumno desarrolle esta competencia, resulta habitual incluir actividades de exposición oral dentro de las asignaturas de los nuevos grados adaptados al EEES. El empleo de rúbricas para la evaluación de estas exposiciones, permite al alumno obtener una visión objetiva, clara y precisa de los criterios que se van a emplear en su valoración. Además, el empleo de estas rúbricas también facilita la posibilidad de que los propios alumnos califiquen el trabajo de sus compañeros, permitiendo que desarrollen capacidades cognitivas superiores como el pensamiento crÃtico y la capacidad de análisis. En este artÃculo, se presenta una experiencia orientada al desarrollo de este tipo de capacidades en los alumnos de nuevo ingreso. Para ello, los alumnos han realizado una evaluación por compañeros de exposiciones orales. Esta evaluación se ha llevado a cabo en la asignatura de IngenierÃa, Sociedad y Universidad, impartida en los grados de IngenierÃa Informática, IngenierÃa Multimedia e IngenierÃa Telemática. Además de la descripción de la experiencia, en este artÃculo también se incluye un estudio de la correlación entre las evaluaciones de la exposición oral realizadas por los alumnos respecto a las llevadas a cabo por los profesores.The oral presentation process, besides being an important instrumental competence within the European Higher Education Area (EHEA), is a major issue in the development of engineers’ work. In fact, the enhancement of their oral presentation skills allows them to transmit their knowledge to the audience effectively. In order to develop this competence, oral presentation activities are usually included within the program of subjects belonging to the new EHEA-adapted degrees. The use of rubrics for the assessment of these presentations allows students to obtain an objective, clear and accurate view of the criteria employed in the evaluation process. Moreover, the use of these items gives the students the possibility to rate their peers job, which also helps them to develop higher cognitive skills such as critical thinking or other analytical capabilities. In this paper we present an experience aimed at developing these capabilities in new students. To this end, the students themselves have been asked to assess their peers presentations. This assessment has been conducted within the ’Engineering, Society and University’ subject, which is taught in several degrees: Computer Science, Multimedia Engineering and Telematics Engineering. In addition to the description of this experience, the paper includes a statistical analysis of the obtained results, showing the correlation between the assessments corresponding to students and those of the teacher.Este trabajo ha sido financiado por el Vicerrectorado de Cultura, Igualdad y Planificación de la Universidad de Valencia, dentro del proyecto de innovación educativa con número de expediente 118/FO11/49
Fast channel estimation in the transformed spatial domain for analog millimeter wave systems
Fast channel estimation in millimeter-wave (mmWave) systems is a fundamental enabler of high-gain beamforming, which boosts coverage and capacity. The channel estimation stage typically involves an initial beam training process where a subset of the possible beam directions at the transmitter and receiver is scanned along a predefined codebook. Unfortunately, the high number of transmit and receive antennas deployed in mmWave systems increase the complexity of the beam selection and channel estimation tasks. In this work, we tackle the channel estimation problem in analog systems from a different perspective than used by previous works. In particular, we propose to move the channel estimation problem from the angular domain into the transformed spatial domain, in which estimating the angles of arrivals and departures corresponds to estimating the angular frequencies of paths constituting the mmWave channel. The proposed approach, referred to as transformed spatial domain channel estimation (TSDCE) algorithm, exhibits robustness to additive white Gaussian noise by combining low-rank approximations and sample autocorrelation functions for each path in the transformed spatial domain. Numerical results evaluate the mean square error of the channel estimation and the direction of arrival estimation capability. TSDCE significantly reduces the first, while exhibiting a remarkably low computational complexity compared with well-known benchmarking schemes
Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance
The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent systems-on-chip composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects, such as the tradeoffs between performance, energy efficiency, and exploitation of parallelism by taking into account real-time constraintsThis work was supported in part by the Post-Doctoral Fellowship from Generalitat
Valenciana under Grant APOSTD/2016/069, in part by the Spanish
Government under Grant TIN2014-53495-R, Grant TIN2015-65277-R, and
Grant BIA2016-76957-C3-1-R, and in part by the Universidad Jaume I under
Project UJI-B2016-20.Publicad
A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model
"© 2017 IEEE. Personal use of this material is permitted. PermissÃon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisÃng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works."[EN] Spatial audio-rendering techniques using head-related transfer functions (HRTFs) are currently used in many different contexts such as immersive teleconferencing systems, gaming, or 3-D audio reproduction. Since all these applications usually involve real-time constraints, efficient processing structures for HRTF modeling and interpolation are necessary for providing real-time binaural audio solutions. This letter presents a parametric parallel model that allows us to perform HRTF filtering and interpolation efficiently from an input HRTF dataset. The resulting model, which is an adaptation from a recently proposed modeling technique, not only reduces the size of HRTF datasets significantly, but also allows for simplified interpolation and real-time computation over parallel processors. In order to discuss the suitability of this new model, an implementation over a graphic processing unit is presented.This work was supported by the Spanish Ministry of Economy and Competitiveness under Grant TEC2012-37945-C02-02 and FEDER funds and by the UNKP-16-4-III New National Excellence Program of the Hungarian Ministry of Human Capacities. The work of J. A. Belloch was supported by GVA Postdoctoral Contract APOSTD/2016/069.Ramos Peinado, G.; Cobos Serrano, M.; Bank, B.; Belloch RodrÃguez, JA. (2017). A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model. IEEE Signal Processing Letters. 24(10):1507-1511. https://doi.org/10.1109/LSP.2017.2741724S15071511241